CBT Campus' Online Skills Training Courses.

IT Skills

Enterprise Database Systems

Big Data

Apache Hadoop and MapReduce Essentials

df_ahmr_a01_it_enus

df_ahmr_a02_it_enus

Apache Hadoop

Course Number:
df_ahmr_a01_it_enus

Expected Duration (hours)
2.0

Lesson Objectives

Apache Hadoop

start the course
describe the basics of Hadoop
identify the major users of Hadoop, the end-user application, and the result
identify the characteristics of Big Data
compare and contrast the traditional data sources and Big Data sources
describe the clustering and distributed computing concepts of Hadoop
specify low cost commodity servers in Big Data and its configurations as nodes in small and large scale Hadoop installations
describe Hadoop installation requirements
troubleshoot Hadoop installation issues
configure Hadoop installation
identify the features of third party Hadoop distributions
describe the creation and evolution of Hadoop and its related projects
describe the use of YARN in Hadoop cluster management
describe the components and functions of Hadoop
compare and contrast the different types of Hadoop data
describe the four different types of cloud databases in NoSQL Databases
describe the basics of the Hadoop Distributed File System
describe HDFS and basic HDFS navigation operations
perform file operations such as add and delete within HDFS
describe the basic principles of MapReduce and general mapping issues
specify the use of Pig and Hive in Hadoop Map Reduce jobs
describe the use of MapReduce, MapReduce lifecycle, job client, job tracker, task tracker, map tasks, and reduce tasks
describe Hadoop MapReduce handles, data processes data, and vocabulary of the MapReduce dataflow process
describe the process of mapping and reducing
describe the basic principles and uses of Hadoop

Overview/Description
Apache Hadoop is a set of algorithms for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. This course will introduce the basic concepts of cloud computing using Apache Hadoop, cloud computing, Big Data, and the development tools applied.

Target Audience
This path is designed for developers, managers, database developers, and anyone interested in learning the basics of Hadoop, or cloud computing in general.

MapReduce Essentials

Course Number:
df_ahmr_a02_it_enus

Expected Duration (hours)
2.0

Lesson Objectives

MapReduce Essentials

start the course
describe the job components and the steps of Hadoop MapReduce
identify how each MapReduce process is vital to the overall MapReduce algorithm through a conceptual example
configure Java to write Hadoop MapReduce jobs and identify the functionality of the classes within additional JARs
create and execute Hadoop MapReduce jobs, and perform compilation and running of MapReduce programs
describe the basic features and functions of the programmatic steps in a Hadoop MapReduce job
describe the concept of MapReduce chaining and compare the input and output steps in MapReduce jobs
identify the precompile, compile, and run commands, and specify different techniques to package and run MapReduce jobs
describe the storage and reading of MapReduce stores and Big Data, and handling of MapReduce and Hadoop data with HDFS over a distributed processing system
compare the persistence in the HDFS with other file storage systems, describe the specifics of reading and writing data in the HDFS, and the redundancy of HDFS across the cluster
describe the basics of Apache Hive and HiveQL
classify the usage of the four file formats supported in Hive – TEXTFILE, SEQUENCEFILE, ORC, and RCFILE
describe how to write Hive jobs by using the custom Hive data types – arrays and maps
describe how Pig is used to obtain data by using it as Pig Latin, like SQL
write Pig scripts, and describe the Pig, Local, MapReduce, and Batch modes
list the Pig commands such as LOAD, LIMIT, DUMP, and STORE for data read/write operators in Pig Latin
compare and contrast the internals and performance, and analyze the strengths and weaknesses of MapReduce, Hive, and Pig
describe the jobs run in MapReduce, and the unit testing process, tools, and techniques
recognize MapReduce job status, review, and understand the log files of different distributions of Hadoop
identify the scenarios where a MapReduce job would need to be terminated, and apply the "-list" and "-kill" commands
define JUnit and JUnit configuration scripts, and identify testing techniques and test cases using JUnit
describe Cloudera MRUnit, unit testing process, and unit testing files, and compare unit testing with MRUnit and without MRUnit
apply the use of a dummy cluster for unit and integration testing, and the basics of a mini HDFS and a mini MapReduce cluster
define the basics of the Hadoop LocalJobRunner
describe the basics of programming in MapReduce, Hive, and Pig

Overview/Description
MapReduce programming is a framework for processing parallelizable problems across huge datasets. This course will define MapReduce programming and explain the basics of programming in MapReduce and Hive.

Target Audience
This path is designed for developers, managers, database developers, and anyone with the basic knowledge of Java interested in learning the basics of programming in MapReduce.